Low-Resource Speech Recognition of 500-Word Vocabularies

نویسندگان

  • Sabine Deligne
  • Ellen Eide
  • Ramesh Gopinath
  • Dimitri Kanevsky
  • Benoit Maison
  • Peder Olsen
  • Harry Printz
  • Jan Sedivy
چکیده

We describe techniques for enhancing the accuracy, efficiency and features of a low-resource, medium-vocabulary, grammarbased speech recognition system. Among the issues and techniques we explore are front-end speech / silence detection to reduce computational workload, the use of the Bayesian information criterion (BIC) to build smaller and better acoustic models, the minimization of finite state grammars, the use of hybrid maximum likelihood and discriminative models, and the automatic generation of baseforms from single new-word utterances. We report WER figures throughout, as appropriate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sub-word modeling for automatic speech recognition

Modern automatic speech recognition systems handle large vocabularies of words, making it infeasible to collect enough repetitions of each word to train individual word models. Instead, large-vocabulary recognizers represent each word in terms of sub-word units. Typically the sub-word unit is the phone, a basic speech sound such as a single consonant or vowel. Each word is then represented as a...

متن کامل

Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition

The lack of transcription files will lead to a high out-ofvocabulary (OOV) rate and a weak language model in lowresource speech recognition systems. This paper presents a web data selection method to augment these systems. After mapping all the vocabularies or short sentences to vectors in a low-dimensional space through a word embedding technique, the similarities between the web data and the ...

متن کامل

Speech recognition using sub-word units dependent on phonetic contexts of both training and recognition vocabularies

This paper proposes a new speech recognition algorithm using a new context-dependent recognition unit design method for e cient and precise acoustic modeling. This algorithm uses both training and recognition vocabularies to select context-dependent units which precisely represent acoustic variations due to phonetic contexts in a recognition vocabulary. An e cient training algorithm for selecte...

متن کامل

Speech recognition for huge vocabularies by using optimized sub-word units

This paper describes approaches for decomposing words of huge vocabularies (up to 2 million) into smaller particles that are suitable for a recognition lexicon. Results on a Finnish dictation task and a flat list of German street names are given.

متن کامل

Speech Recognition Experiments with Perceptrons

Artificial neural networks (ANNs) are capable of accurate recognition of simple speech vocabularies such as isolated digits [1]. This paper looks at two more difficult vocabularies, the alphabetic E-set and a set of polysyllabic words. The E-set is difficult because it contains weak discriminants and polysyllables are difficult because of timing variation. Polysyllabic word recognition is aided...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001